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Abstract — We propose and evaluate an inununo-inspired ap- 
proach to misbehavior detection in ad hoc wireless networks. 
Node misbehavior can be the result of an intrusion, or a software 
or hardware failure. Our approach is motivated by co-stimulatory 
signals present in the Biological immune system. The results 
show that co-stimulation in ad hoc wireless networks can both 
substantially improve energy efficiency of detection and, at the 
same time, help achieve low false positives rates. The energy 
efficiency improvement is almost two orders of magnitude, if 
compared to misbehavior detection based on watchdogs. 

We provide a characterization of the trade-offs between 
detection approaches executed by a single node and by several 
nodes in cooperation. Additionally, we investigate several feature 
sets for misbehavior detection. These feature sets impose different 
requirements on the detection system, most notably from the 
energy efficiency point of view. 

1. Introduction 

Ad hoc wireless networks can be subject to a large variety of 
attacks or intrusions. These attacks can range from a simple 
packet dropping to advanced attacks executed in collusion, 
possibly utilizing a superior computational platform than that 
of the attacked network. It is an ambition of secure protocols 
to prevent all or a majority of these attacks. Experience from 
the Internet, however, points out that flaws in these protocols 
are continuously being found and exploited |[T]. 

Performance analysis of security and protection solutions 
for ad hoc wireless networks received of a lot of interest from 
the community; see ||2l, O for a review. It is currently unclear 
to what extent ad hoc wireless networks will be subject to 
various attacks. The history of security of home and small 
mobile computing platforms however points out that such 
attacks can disrupt or even completely interrupt the normal 
operations of networks [4J. 

In the future protecting ad hoc wireless networks can 
become as challenging a task as protecting home computing 
platforms. Many ad hoc networks are expected to be based 
on wireless devices with restricted computational and com- 
munication capabilities, and very limited battery resources. In 
many application scenarios, attack signature updates from a 
centralized site are infeasible. Correcting the consequences 
of some failures or attacks might only be possible by a 
costly human intervention, or not at all. An example where 
such a correction might be nearly impossible are underwater 
networks f5\. 



The above facts establish the basic motivation for design- 
ing autonomous detection and response systems that aim 
at offering an additional line of defense to the employed 
secure protocols. Such systems should provide several layers 
of functionality including the following: (i) distributed self- 
learning and self-tuning with the aspiration to minimize the 
need for human intervention and maintenance, (ii) active 
response with focus on attenuation and possibly elimination 
of negative effects of misbehavior on the network. 

The ability of self-learning and self-tuning implies a set of 
observable features, from which it can be deduced whether 
a member of an ad hoc network misbehaves or whether the 
conditions in the given ad hoc network could lead to decreased 
Quality of Service to others. An important task is therefore to 
identify a set of features that are useful in detecting a larger 
set of misbehavior types. 

Best current practices for misbehavior detection in ad hoc 
wireless networks are almost exclusively done on a domain 
knowledge basis; see [2J, [3|, [6| and references therein. 
Although such an approach allows to find a good predictor 
for a specific type of misbehavior, it fails to deliver a broader 
knowledge on design specifics of misbehavior detection sys- 
tems. 

Our goal was to benefit from the wealth of information 
available at the various layers of the OSI protocol stack 
and to provide a performance assessment of misbehavior 
detection done by a single node or by several nodes acting 
in a cooperative manner complemented with exchange of 
network measurements (features). We divided the examined 
feature set into several subsets with respect to their energy 
efficiency and protocol assumptions. We employed a wrapper 
method Q to assess the efficiency of individual features 
subsets. Most importantly, we investigated applicability of 
a mechanism inspired by the efficiency of the Biological 
immune system (BIS). The BIS and its technical counterpart. 
Artificial immune systems (AIS), are currently under increased 
examination in the area of wireless networks security [8|, 
[|9l , ifTOI . Our approach is motivated by the mechanisms that 
allow the key players of the BIS such as T-cells, B-cells, 
dendritic cells, macrophages etc. to communicate with each 
other and thus provide a more robust mechanism for detecting 
and eliminating foreign agents such as viruses or bacteria. 
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Our focus stays especially on the remarkable ability of the 
BIS to avoid false positives in the classification process fTTl. 
This is demonstrated by the rareness of severe auto-immune 
reactions in humans (although milder forms of allergies are 
unfortunately rather frequent). 

This document is organized as follows. In Sectionllllwe give 
a short overview of the Biological immune system. Section Hill 
offers a summary of the related work. In Section |IV] our 
evaluation approach is introduced. In Section|V]we summarize 
the assumptions and protocols relevant to our experiments. 
Section [VT] defines the features used in the performance 
evaluation. In Section IVIII our immuno-inspired architecture 
is introduced and analyzed. Section IVIIII describes in detail 
the experimental setup. In Section HXl we discuss the obtained 
results. And finally, in Section |X] we conclude and give an 
outlook on future research. 

II. The Biological Immune System 

The Biological immune system (BIS) 111] can quickly rec- 
ognize the presence of foreign microorganisms in the human 
body. It is remarkably efficient, most of the time, in correctly 
detecting and eliminating pathogens such as viruses, bacteria, 
fungi or parasites, and in choosing the correct immune re- 
sponse. When confronted with a pathogen, the BIS relies on 
the coordinated response from both of its two vital parts: 

• the innate system: the innate immune system is able to 
recognize the presence of a pathogen or tissue injury, and 
is able to signal this to the adaptive immune system. 

• the adaptive system: the adaptive immune system can 
develop during the lifetime of its host a specific set of 
immune responses. 

For an immune reaction to occur, it is necessary that (i) 
a cell has been classified as a pathogen and (ii) this cell 
could cause some damage to the human organism. This means 
that the BIS is only reactive with infectious cells, i.e. with 
pathogens that can indeed cause harm flTl. 

This demonstrates that a two-way communication, hereafter 
referred to as co-stimulation, between the innate and adaptive 
immune systems is common. Immunologists such as Frauwirth 
and Thompson describe co-stimulation as the involvement of 
"recipmcal and sequential signals between cells " in order to 
fully activate a lymphocyte {VT\. The role of lymphocytes is 
to recognize a specific pathogen, to trigger a corresponding 
immune reaction, in some forms they are also capable of 
pathogen elimination. 

In the subsequent sections we will introduce and evaluate 
an approach inspired by the interplay between the innate and 
adaptive immune system. The goal of this approach is to help 
suppress false positives and at the same time achieve energy 
efficiency. 

III. Related Work 

A. AIS Based Misbehavior Detection 

The early work in adapting the BIS to networking has been 
done by Stephanie Forrest and her group at the University of 



New Mexico. In one of the first BIS inspired works, Hofmeyr 
and Forrest (T3\ described an AIS able to detect anomalies 
in a wired TCP/IP network. Co-stimulation was in their setup 
done by a human operator who was given 24 hours to confirm 
a detected attack. 

Sarafijanovic and Le Boudec |8| introduced an AIS for 
misbehavior detection in mobile ad hoc wireless networks. 
They used four different features based on the network layer 
of the OSI protocol stack. They were able to achieve a 
detection rate of about 55%; they only considered simple 
packet dropping with different rates as misbehavior A co- 
stimulation in the form of a danger signal emitted by a 
connection source was used to inform nodes on the forwarding 
path about perceived data packet loss. 

An AIS for sensor networks was proposed by Drozda et al. 
in mol . The implemented misbehavior was packet dropping; 
the detection rate was about 70%. 

Classification techniques proposed in ||T3l, llsl. lfT4l. ifTOl 

were based on the negative selection, a learning mechanism 
applied in training and priming of T-cells in the thymus. 
In the computational approach to negative selection due to 
D'haeseleer et al. ifTSll . a complement to an n-dimensional 
vector set is constructed. This is done by producing random 
vectors and testing them against vectors in the original vector 
set. If a random vector does not match anything in the original 
set, it becomes a member of the complement (detector) set. 
The vectors from the detector set are then used to identify 
anomalies (misbehavior). Only very recently, an efficient al- 
gorithm for negative selection was presented by Elberfeld and 
Textor |16|. 

An approach based on the Danger theory was proposed by 
Kim et al. in ||9l- Several types of danger signals, each having 
a different function are employed in order to detect routing 
manipulation in sensor wireless networks. The authors did not 
undertake any performance analysis of their approach. 

It is beyond the scope of this document to provide an 
exhaustive summary of AIS based approaches. A review of 
the theoretical aspects of several BIS inspired algorithms was 
compiled by Timmis et al. in IITtI . The various application ar- 
eas of BIS inspired approaches were reviewed by D. Dasgupta 
in lITSl . The applicability with respect to ad hoc networks was 
reviewed in |[T9l . 

Even though the BIS seems to be a good inspiration for im- 
proving misbehavior detection in ad hoc and sensor networks, 
approaches based on machine learning and similar methods 
received much more attention; see 161, lH and the references 
therein. Despite recent efforts, energy efficient misbehavior 
detection remains to be an open problem. In the following 
sections, we use the watchdog approach due to Marti et al. 1201 
as a basis in our misbehavior detection and energy efficiency 
evaluation. Even though, the watchdog approach is not energy 
efficient, it received a great attention in the literature and it is 
thus perceived by many as the standard approach. 
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Fig. 1. Wrapper approach. 



B. Cascading Classifiers 

The misbehavior classification approach, that we herein 
present and analyze, resembles the "cascading classifiers" of 
Kaynak and Alpaydin introduced in [21 1. Their approach is 
based on a sequential application of several classifiers such 
that "at the next stage, using a costlier classifier, we build a 
more complex rule to cover those uncovered patterns of the 
previous stage". Our BIS inspired approach can be seen as 
an instance of cascading classification. It is however not the 
complexity of classification rules that is increased at each step 
but the energy cost connected with observing additional states 
and events necessary for a more precise reasoning about a 
possible misbehavior 

Cascading classifiers were empirically studied by Gama and 
Brazdil in f22\. They combined several types of classifiers: 
Bayes classifier, C4.5 and linear discriminant function. Their 
focal point was to investigate whether cascading classification 
could offer some classification performance improvement over 
the classification using a single classifier. 

IV. Evaluation Approach 

To evaluate the performance of our immuno-inspired ar- 
chitecture, we use the wrapper approach described in fT\. 
The wrapper approach requires a training and a test data set 
as input. In our case, these data sets are obtained through 
simulating an ad hoc network. Our choice of network simulator 
was JiST/SWANS |23|. SWANS is a Java based network 
simulator that offers a substantial simulation performance 
speed-up over other alternatives, most notably Glomosim and 
ns2. Vectors in both sets are labeled as either representing a 
normal behavior or behavior when the network is subject to 
an attack or intrusion. 



Since the feature space can be large, it is desirable to iden- 
tify features that can significantly contribute to misbehavior 
detection. This is done by running an optimization algorithm 
over the feature space. The goal of the optimization is to find a 
(small) feature subset that is efficient in detecting misbehavior. 
In our case, the input to the optimization algorithm is encoded 
as a bit vector of length fc; a bit at position i set to 1 means 
that the i-th feature is in the feature subset currently evaluated. 
The wrapper approach in our setup is depicted in Fig. [T] It can 
be summarized as follows: 

1) Training and test data sets are produced through simu- 
lation. 

2) An optimization algorithm computes a subset of the fea- 
ture space that is expected to be optimal with respect to 
one or several performance measures. The optimization 
algorithm is initialized with an empty (with all bits set 
to zero) or random bit vector 

3) An induction algorithm uses a part of the training set 
to learn a classifier. This classifier is applied to the 
remaining part (hold-out set); performance measures are 
computed. The optimization algorithm is supplied with 
performance measures of the classifier using the current 
feature subset. Steps (2) and (3) are repeated until a 
termination condition is met. 

4) At last, a thorough evaluation of the optimal feature 
subset using the same induction algorithm as in Step (3) 
and employing the test set is done. This optimal feature 
set alongside with its expected performance are being 
output. 

The training and test sets and the hold-out set in Step (3) 
were produced by using stratified n-fold cross-validation l,24J . 

V. Protocols and Definitions 

A. Protocols 

We now state several protocols, mechanisms and assump- 
tions relevant to our experiments. Node misbehavior can be 
the result of an intrusion, or a software or hardware failure. 
Additional faults in ad hoc networks can be introduced by 
mobility, signal propagation, link reliability and other factors. 
The reason for nodes (possibly fully controlled by an attacker) 
to execute any form of misbehavior can range from the desire 
to save battery power to the intention of making an ad hoc 
network non-operational. 

We consider AODV |25|, a well-known on-demand routing 
protocol using the RREQ and RREP handshake to establish 
routes, as the underlying routing protocol. 

At the MAC (Medium access control) layer, the contention 
based IEEE 802.11 MAC protocol using both carrier sensing 
and RTS-CTS-DATA-ACK handshake is considered. Should 
the medium not be available or the handshake fails, an 
exponential back-off algorithm is used. This is combined with 
a mechanism that makes it easier for neighboring nodes to 
estimate transmission durations. This is done by an exchange 
of duration values and their subsequent storing in a data 
structure known as Network allocation vector (NAV). 
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Alternative MAC protocols like the 802.15.4 MAC proto- 
col avoid using the RTS-CTS-DATA-ACK handshake, only 
relying on carrier sensing to access the medium, in order to 
decrease energy consumption at nodes. 

User Datagram Protocol (UDP) is a transport layer protocol 
that does not guarantee any reliability, data integrity or data 
packet ordering. 

In promiscuous mode, a node listens to the on-going traffic 
among other nodes in the neighborhood and collects informa- 
tion from the overheard packets. Promiscuous mode is energy 
inefficient because it prevents the wireless interface to operate 
in sleep mode, forcing it into either idle or receive mode; 
there is also extra overhead caused by analyzing all overheard 
packets. According to |26|, power consumption in idle and 
receive modes is about 12-20 higher than in sleep mode. 

We do not assume any time synchronization among nodes. 
We assume that packets are authenticated, i.e. the sender of 
any packet can be easily identified and as well can be any 
changes in the packet body. This is a reasonable assumption 
in line with e.g. the ZigBee specification 1271 . 

B. Performance Measures 

We evaluate the misbehavior classification performance in 
terms of detection rate and false positives (FP) rate. We assume 
that a classifier K. computed by a learning algorithm is used 
in the classification process. The classifier K, is then used to 
classify the objects il = {oi, ...,0p}, where p is the number 
of objects. The two measures are then computed as follows: 



det. rate^. (/C) = ^ x 100.0% 



FP rate% (/C) 



FPr.. 



FPr 



X 100.0% 



(1) 



(2) 



where Cj is the j-th class, ric is the number of objects 
labeled with the class cj; note that ric > in all our 
experiments. is the number of objects that were cor- 
rectly classified by the learning algorithm as belonging to the 
class Cj. FPc is the number of objects incorrectly predicted 
as belonging to cj. 95% confidence intervals (C/95%) were 
computed for each measure. 

The overall misclassification rate is evaluated by means of 
the classification error: 

T FPc- 

class, error" (IC) = x 100.0% (3) 

VI. The Features 

We considered 24 features from three layers of the OSI 
protocol stack: data link (MAC), network and transport layer 
Our focus was on two basic types of features: (i) performance 
related such as e.g. latency or throughput and (ii) network 
topology related such as e.g. node degree, network diameter, 
average path length to a destination as recorded in the routing 
table. We also included features well known previously such 
as the watchdog feature ll20l : some others were motivated by 
the results published in IS], iTlOi . The features, not adapted 



from II20II . |[3l , ifTOl . were found by considering the used 
protocols and choosing such features that do not add too much 
computational overhead. There was no formal method (Petri 
nets etc.) applied in this process. 

Let Ss, si, Si, Sj+i, be the path between Sg 

and Sii determined by a routing protocol, where Sg is the 
source node, Sd is the destination node. The features in the 
below introduced feature set / are averaged over a sliding 
time window of the size win. size. Let pctsrx and pctsjix 
be the number of data packets sent and received by Si in a 
time window of the size win. size, respectively. 
MAC Layer Features: 
Ml MAC handshake ratio: Computed by si as: 



Ml 



EpctSTx #ACKp 
p=l #RTSp 



pctSTX 

where ^RTSp is the number of RTS packets sent 
to Si+i and ^ACKp is the number of ACK packets 
received by Si from s^+i, when reserving the wireless 
medium for the packet p. This feature estimates the 
medium congestion level from the number of hand- 
shakes that were brought to completion, i.e. did not end 
up before an ACK was received. 
M2 Back-off level index: MAC protocol back-off level BO 
just before a data packet p is transmitted from Si to s^+i. 

M2 = =^ 

pctSTX 

A/2 is similar to Ml, the congestion level is, however, 
estimated from the number of back-offs. 
M3 Forwarding index (watchdog): Ratio of data packets 
sent from Si to s^+i, TXg^ and then subsequently 
forwarded to Si+2, TXg.^-^. 

TXg 



M3 = 



TXg 



The data packets that have s^+i as the destination node 
are excluded from this statistic. 
M4 Processing delay: Time delay that a data packet accu- 
mulates at Si+i before being forwarded to Si+2- 



M4 : 



pctSTX 

The data packets that have Si+i as the destination node 
are excluded from this statistic. 
M5 Data rate index: Amount of data (in bits/s) forwarded 
by node Si in a time window. 



E^f szze(p) 



M5 



win. size 

where size{p) is the size of a data packet p that is 
forwarded in the given time window. The data packets 
that originate at si are excluded. 
M6 Node degree index: Number of neighboring nodes with 
which Si had an active data exchange: 

M6 = ^neigh 
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where #neigh is the number of neighbors in a time 
window. 

M7 2-hop neighborhood index: Number of neighboring 
nodes within 2-hop distance from Sj: 

M7 = #2neigh 

where #2neigh is the number of unique MAC layer 
destinations extracted from overheard data packets in a 
time window. 

Routing Layer Features: 
Rl Forwarding index for RREQ: Number of unique 
RREQs (i.e. with a unique source id and sequence 
number) forwarded by the node s,. NormaUzed by the 
time window size: 

_ #RREQ 
win. size 

R2 Forwarding index for RERR: Number of RERRs 
forwarded by the node Sj. NormaUzed by the time 
window size: 

^2 _ #RERR 
win. size 

R3 Forwarding index for RREP: The same ratio as in M3 

but computed separately for RREP routing packets. 
R4 Processing delay for RREP: The same delay as in M4 

but computed separately for RREP routing packets. 
R5 Average distance to destination: Average number of 

hops from Sj to any destination as recorded in the routing 

table: 



R5 



E table size \ i 



table_size 

where \rk \ is the length of the route to a destination 
and table_size is the number of destinations recorded. 
R6 Routing activity index: Number of RREQ, RREP and 
RERR packets received by node Si. Normalized by the 
time window size. 

^„ #RREQ + #RREP + #RERR 
Ro = 

win. size 

R7 Connectivity index 1: Number of unreachable destina- 
tions i^unreach as recorded in the routing table of node 

Si. 

R7 = =l^unreach 

R8 Connectivity index 2: Number of invalid routes 
^invalid as recorded in the routing table of node Sj. 

R8 = =ffinvalid 

R9 Connectivity index 3: Number of destinations ifdest 
with known routes as recorded in the routing table of 
node Si. 

R9 = ij^dest 

RIO Cut index 1: Number of connections ^conned with 
data packets forwarded by node Sj in a time window. 



RIQ = ^connect 

Rll Cut index 2: Number of RREP packets forwarded by 
node Si. NormaUzed by the time window size. 



Rll 



#RREP 



win. size 



R12 Diameter index: Number of hops to the furthermost 
destination as recorded in the routing table of node Sj. 

R12 = maxk{\rk\} 

where k = 1 . . . table_size. 

Transport Layer Features: 
Tl Out-of-order packet index: Number of DATA packets 
that were received by Si out of order. 

Tl = ^^ 

win. size 

where #00 is the number of data packet received 
out-of-order. #00 is incremented, if node s, re- 
ceives on the connection c a data packet pj such that 
seq. number{pj) — 1 ^ seq. number{pj-i), where 
seq. number {pj) is the sequence number of the data 
packet Pj . This assumes that the connection source uses 
an incremental (or similar easily predictable) scheme for 
computing seq. number {pj). 
T2 Interarrival packet delay index 1: Average delay 
between data packets pj and Pj+i sequentially received 
by Si. The delay was computed separately for each 
connection and then a master average was computed. 

E#connect j ? 

^ c=i avg_delayc 

^connect 

where avg_delayc is the average delay for data packets 
belonging to the connection c. It is defined as: 

^pct^^Rx delay c{pj+i,pj) 

pcts%x 

where pctSj^x is the number of data packets received 
by Si on the connection c. delay dp j+i,Pj) is the delay 
between the data packets Pj+i and pj transported by the 
connection c. 

T3 Interarrival packet delay variance index 1: Variance 
of delay between DATA packets received by Si. The 
variance was computed separately for each connection 
and then a master average was computed. 

^^connect 

T3 ' 



E#cor 
c=l 



avg_var_delayc 



^connect 

where avg_var_delayc is the variance of the delay for 
data packets belonging to the connection c. It is defined 
as: 

Y.Tli'''' {delay c{P]+i,Pj) - avg_delaycY 



pets 



RX 
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Fig. 2. Data traffic measurement model. 



T4 Interarrival packet delay index 2: Average delay 
between DATA packets received by s^. 



T4 : 



Ejfi""" delay {pj+i,pj) 
pctsRX 



where delay{pj+i, pj) is the delay between any two data 
subsequent packets Pj+i and pj received by Si. 
T5 Interarrival packet delay variance index 2: Variance 
of delay between DATA packets received by Si. 



T5 = 



ET=i''"idday(^p,+,,p,)~TAr 
pctsjix - 1 



Occasionally, it was not possible to compute a feature 
because there was no traffic between Si and s^+i. Such time 
windows were not included in our experiments. The features 
T2, T4 and T3, T5 are identical, if only data packets belonging 
to a single connection are received by Si. 

The watchdog features M3-M4, R1-R4 and the M7 feature 
can be considered energy inefficient because they rely on 
promiscuous mode. Therefore, we decided to consider the 
following three subsets of the feature set /: 

1) /o = /. 

2) /i = /o \ {M3,MA,M7,R1 - i?4}. /i excludes all 
features that rely on promiscuous mode. 

3) /2 = /i \ {-^^1}- /2 further excludes features based on 
MAC protocols using the RTS-CTS-DATA-ACK hand- 
shake. This subset could be relevant to sensor networks 
based on the IEEE 802.15.4 MAC protocol. 

To distinguish between a feature set and its numerical 
instance computed in a given time window, we introduce the 
following "hat" notation: /o, /i and /2. 

VII. An Immuno-Inspired Approach 
A. General Description 

Our approach is inspired by the co-stimulation mechanism 
of the BIS. Since co-stimulation benefits from the interplay 
between the innate and adaptive immune systems, we decided 
to base misbehavior detection on two different feature sets 
Si and ^2. These two feature sets serve as a basis for two 
interconnected classifiers /C(5i) and /C(S'2). 

The classification is done by sequentially applying /C(S'i) 
and /C(S'2), where /C(S'i) mimics the classification capability 
of the innate immune system and IC{S2) mimics the classifi- 
cation capability of the adaptive immune system: 



Let ^g., i = 1, 2 be a feature cost measure that reflects all 
feature computation costs induced by the feature set Si. Our 
goal was to use and 5*2, such that: 

CS^ < CS, (5) 

class, error^^ {IC{Si)) > class, error^^ {IC{S2)) (6) 

This means, first a classifier with a lower feature cost 
but a higher classification error is applied. If the classifier 
/C(S'i) detects a misbehavior, then the classifier /C(S'2) with a 
higher feature cost and a lower classification error is applied. 
Hereafter we refer to the conditional process that triggers the 
IC{S2) classification as co-stimulation. Note that if the condi- 
tion expressed in Eq. |6] is not fulfilled, only a classification 
based on Si is necessary, since IC{Si) would offer both a 
lower classification error and a lower feature cost. 

Co-stimulation in the BIS can take form of a feedback 
loop lfT2l . where one part of the BIS interacts with another part 
and vice-versa, until the activation threshold for an immune 
reaction is met. For the ease of co-stimulation performance 
analysis, we decided not to consider such a feedback loop this 
time. 

B. Detailed Description 

Co-stimulation can be in ad hoc wireless networks imple- 
mented in several ways depending on the application scenario, 
the applied protocols or the expected misbehavior type. Let us 
focus on an application scenario with the assumption that a 
misbehaving node can be detected by its neighbors. 

This translates to our architecture as follows: the node 
Si+2 computes the restricted feature set /2 (or /i). This (co- 
stimulatory) feature set is then proliferated in the upstream 
connection direction (towards connection source); see Fig. |2] 
Since the computation of /2 is time window based, the 
frequency with which /2 gets sent depends on the time window 
size. In our implementation, /2 was sent out immediately at 
the end of each time window. Upon receiving /2, the node 
Si compares it with its own /2 sample (later we introduce a 
machine learning approach that implements this comparison 
task). Based on this, a behavior classification with respect to 
the node Si+i is done. If Si classifies s^+i as misbehaving, 
it will compute /q. If misbehavior is detected again, s^+i is 
finally classified as misbehaving. 

Before proceeding any further, we introduce the following 
notation in order to keep track of composite feature sets 
computed at s, and s,+2: Tq' = fo U/o'+'. -^i' = fi' U/r+' 



/I' U f, 



Si + 2 



Similarly, J'q' 



Si + 2 





and J"!' 

jr^' = f^' o and = o where o is the 

operator of vector concatenation. For simplicity, whenever 
clear from the context, we will omit the superscripts. 

A formal description of our co-stimulation inspired ap- 
proach is presented in Alg. [T] With respect to the notation 
introduced in Eq. ID our co-stimulation inspired approach can 
be succinctly expressed as: 



ICiSi) 



CO — stimulation 



/C(52 



(4) 



CO— stimulation 



(7) 
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Algorithm 1 Co-stimulation based misbehavior detection at node Si 
Require: Sufficient data traffic in current time window 
Require: f2'^^ from 2-hop downstream node 
1: procedure DETECT_MISBEHAVIOR 

2: boolean suspicious False 

3: COMPUTE_/2_FEATURE_SAMPLE(Si) 
4: ^ fi' U 

5: if Classification(J'2) == misbehavior then 
6: suspicious <r- True 

7: end if 

8: if suspicious == True then t> Co- stimulation: /C(J^2) — J' ^(/o) 

9: /o -f- COMPUTE_/o_FEATURE_SAMPLE(Si) 

10: if Classification(/o) == misbehavior then 

11: MARK_AS_MiSBEHAViNG(si+i) > Misbehavior confirmed 

12: end if 

13: end if 

14: end procedure 



In the following sections, we show, using experiments and 
an energy model for an IEEE 802.11 wireless card, that Eq. |5] 
and Eq. |6]hold in this case. 

A few observations can be made at this stage. The prolif- 
eration of /2 can be implemented without adding any extra 
communication complexity by attaching this information to 
CTS or ACK MAC packets (a modification the MAC protocol 
would be necessary). As long as there are DATA packets being 
forwarded on this connection, the feature set can be prolifer- 
ated. If there is no DATA traffic on the connection (and thus no 
CTS/ACK packets exchanged), the relative necessity to detect 
the possibly misbehaving node s^+i decreases. Optionally, 
proliferation of /2 can be implemented by increasing the radio 
radius at Si+2, by broadcasting it with a low time-to-live value 
or by using a standalone packet type. 

If the node s,+i decides not to cooperate in forwarding the 
feature set information, the node Si will switch, after a time- 
out, to /o computation. In this respect, not receiving /2 can be 
understood as a form of negative co-stimulation. If the goal is 
to detect a node controlled by an attacker, it is important that 
the originator of /2 can be unambiguously identified, i.e. an 
authentication is necessary. An additional requirement is the 
use of sequence numbers for /2. Otherwise, the misbehaving 
node Si+i could interfere with the mechanism by forwarding 
outdated cached /2. 

In general, if /o were proliferated instead of /2, the size of 
the feature space at the receiving node Si doubles, in our case 
to 48 features. If higher statistical moments were computed 
for all the features in /o, the feature space size could easily 
reach (or even surpass) 100 depending on the complexity 
of employed protocols. This fact translates it into a typical 
machine learning classification problem that we discuss in the 
following sections. 

To summarize, after Si receives f2'. (i) it will process /2, if 
it is not originating from s^+i or (ii) it will forward /2 to all 



upstream neighbors, otherwise. This means, each such feature 
set travels only two hops. Notice that, in general, s^+i or 5^+2 
can have several successor or predecessor nodes. This happens 
if Si+i or Si+2 forwards data packets for several connections. 
More formally, we assume that | • s^+i | > 1, | • S j+2| > 1 and 
\si+i • I > 1, where •Sfc, Sfc» are the sets of all predecessor 
and successor nodes of Sk, respectively. The upper limit for 
the number of received /2 per time window is thus the number 
of connections for which Si forwarded data packets. 

C. Co-stimulation Analysis 

Let us now take a closer look at the classification per- 
formance of our co-stimulation approach. Let 51 be a set 
of vectors that should be classified. Let iljr^ C 51 be the 
subset of vectors that were marked as suspicious (i.e. possibly 
representing a misbehavior), after /C(J-2) was applied. 

Let us first assume that class, error^ (/C(/o)) — 0. If /C(/o) 
is applied to rijr^, it clearly holds: 

class, error"^'^ (^(/o)) = (8) 

This implies, for the final FP rate with respect to a misbe- 
havior class Cj, after co-stimulation is applied, it holds: 

FP mte^^.(/C(J-2) ^ /C(/o)) = (9) 

In other words, the application of /C(/o) removes all vectors 
that were misclassified by /C(7^2)- Furthermore, the final 
detection rate for the given class Cj is determined by the 
detection rate of the JC{F2) classifier: 

det. rate^]^ {IC{T2) /C(/o)) = det. rate^^ (/C(J2) (10) 

The rationale is, only the vectors correctly classified after 
the JC{F2) classification form a basis for /C(/o) classification. 

Since to achieve class, error^ {IC{fo)) = may not be 
possible, Eq. [Tol translates for class. error^{JC{fo)) 7^ to: 

det. rate^,{lC{T2) ^ /C(/o)) < det. rate^.{IC{T2)) (11) 
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To formulate a similar relationship for the false positives 
rate is non-trivial, since, in general, iljr^ can be an arbitrary 
subset of n. However, if: 

FP rate^f' (/C(/o)) = FP rate% (/C(/o)) (12) 

then it holds: 

FP rate^ {JC{F2) ^ - FP rate^ (/C(/o)) (13) 

The validity of this relationship can be well estimated 
through experimental analysis. Our experimental setup and the 
related results are presented in Sec. IVlllI and Sec. IIXI 

VIII. Experimental Setup 

Topology, Connections, Data Traffic and Protocols: We 
used a topology based on a snapshot from the movement 
prescribed by the Random waypoint movement model [28] . 
There were 1,718 nodes simulated; the physical area size was 
3,000m X 3,000m. We used this topology due to the fact that 
this topology was quite extensively studied by Barrett et al. 
in ll29l : results reported therein include many graph-theoretical 
measures that were helpful in finding suitable parameters for 
our experiments. 

We modeled data traffic as Constant bit rate (CBR), i.e. 
there was a constant delay when injecting data packets. This 
constant delay in our experiments was 2 seconds (injection 
rate of 0.5 packet/s); the packet size was 68 bytes. CBR data 
packet sources correspond e.g. to sensors that transmit their 
measurements in predefined constant intervals. CBR can be 
considered an extreme model for data packet injection due to 
its synchronized nature (if a data packet collision at a node 
occurs, there is a high chance that it occurs again in the future). 
In fact, the results published in ll30l show that when using a 
stochastic injection model, such as the Poisson traffic model, 
one can expect a better performance of the detection system. 

We used 50 concurrent connections. The connection length 
was 7 hops. In order to represent a dynamically changing 
system, we allowed connections to expire. An expired connec- 
tion was replaced by another connection starting at a random 
source node that was not used previously. Each connection 
was scheduled to exist approximately 15 to 20 minutes. The 
exact connection duration was computed as 

S + ruX (14) 

where S the desired duration time of a connection, ru 
a random number from the uniform distribution [0, 1] and 
A the desired variance of the connection duration. In our 
experiments, we used A = 5 min. 

We used the AODV routing protocol, IEEE 802.11b MAC 
protocol, UDP transport protocol and IPv4. The channel 
frequency was set to 2.4 GHz. The transmission rate was set to 
2 Mbps. We used the Two-ray signal propagation model f3T|. 
Antenna and signal propagation properties were set so that the 
resulting radio radius equaled 100 meters. 

Misbehavior models: We considered three types of misbe- 
havior (i) DATA packet dropping: 30% DATA packets were 



randomly and uniformly dropped at misbehaving nodes, (ii) 
DATA packet delaying: 30% DATA packets were randomly 
and uniformly delayed by 0.1 second at misbehaving nodes, 
(iii) Wormholes [32]. Wormholes are private (out-of-band) 
links between one or several pairs of nodes. They are added 
by an attacker in order to attract data traffic into them to 
gain control over packet routing and other network operations. 
There were 20 wormholes in each simulation run; the length 
of wormholes was 15 hops, i.e. the source and sink were 15 
hops away before a given wormhole was activated. 

There were 236 randomly chosen nodes to execute DATA 
dropping or delaying misbehavior. As it is hard to predict the 
routing of packets, many of these nodes could not execute any 
misbehavior as there were no DATA packets to be forwarded 
by them. In our case, 236 misbehaving nodes resulted in about 
20-30 actively (concurrently) misbehaving nodes. 

In case of the dropping and delaying misbehavior, our 
intention was to model random failure occurrences, assuming 
a uniform failure distribution in the network. The wormhole 
misbehavior is an instance of misbehavior done in collusion, 
i.e. two nodes must closely cooperate. Any of these types of 
misbehavior can have a significant impact on the medium con- 
tention resolution. DATA dropping removes packets from the 
network and can thus decrease medium congestion. DATA de- 
laying impacts the distribution of medium contention. Worm- 
holes can cause severe changes to a network's topology and 
therefore impact the quality of medium contention at certain 
nodes (such as those lying on a network cut before a wormhole 
was activated). 

Experiments: We did 20 independent runs for each misbe- 
havior type and 20 misbehavior free (normal) runs (4 x 20 runs 
in total). The simulation time for each run was 4 hours. We 
used a non-overlapping time window approach for the feature 
computation. We used four different time window sizes: 50, 
100, 250 and 500 seconds. In case of a 500-second time 
window, there were 28 non-overlapping windows in each run 
(4 hours/500 seconds = 28.8). This gave us 4 x 20 x 28 = 
2,240 vectors (samples) for each node. 

Labeling and constructing the training and test sets: The 
vectors in the training and test sets were labeled as follows: if 
the node s^+i was in a given time window misbehaving, i.e. 
dropping/delaying packets or the start point of a wormhole, 
the vector J^Oi-^i or JF2 was labeled with the respective 
misbehavior class. The vectors from normal runs were all 
labeled as "normal". In order to simplify the experiments, 
we only considered 20 distinct nodes with high traffic rates 
and different node degrees. The nodes were chosen to ensure 
that enough vectors representing each misbehavior class were 
available. Notice that some vectors were excluded from the 
experiments because there was no data traffic between Si 
and Sj+i. 

Optimization algorithm: We used forward selection as op- 
timization algorithm for the wrapper approach [24]. This 
algorithm starts with an empty feature set. The feature that 
decreases the residual classification error the most will then 
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be added. Following this greedy approach, a new feature gets 
added as long as it decreases the residual classification error. 
Optimization by means of forward feature selection delivers a 
normalized feature weight vector These normalized weights 
(range between 0.0 — 1.0) are proportional to the significance 
of a given feature. A feature weight vector was computed for 
each of the top 20 nodes. An average feature weight vector is 
reported. 

Induction algorithm: We used a decision tree classifier. We 
were interested in a less complex algorithm since the wrapper 
method requires that the algorithm is executed multiple times. 
To decide whether a node within the decision tree should 
be further split (impurity measure), we used the information 
gain measure. As the decision tree classifier is a well-known 
algorithm, we omit its discussion. We refer the interested 
reader to ll24l . 

We used implementations of the optimization and induction 
algorithms from the Rapidminer tool |33|. Rapidminer is an 
open-source tool for complex data mining tasks. 

Parameters in the experimental setup are summarized in 
Fig.El 

IX. Performance Evaluation 
A. Basic Performance 

In this subsection, we discuss the misbehavior detection 
performance of IC{J-o), JC{Fi) and lC{J-2)- We first discuss 
the performance of these three classifiers in isolation. The 
performance of our co-stimulation based approach is discussed 
in Sec. IdTcI and Sec. IDTD] 

The results with respect to the detection and false positives 
rate are reported in Table U and The detection and FP 
rates were computed also for all types of misbehavior merged 
into a single class. This allows for a better understanding of 
"misbehavior/no-misbehavior" classification performance. It 
can be observed that /C(J-o) performed very well with respect 
to normal behavior, and dropping and delaying misbehavior. 
In case of the wormholes misbehavior, the FP rate was in the 
range 4.35 - 9.81%. Both /C(J'i) and /C(J2) perform much 
worse than 1C{J-q). The FP rate for lC{J-i) and /C(J^2) was in 
many cases alarmingly high. Decreasing the time window size 
delivered in many cases a statistically significant improvement. 

In Table Hn] the confusion matrices for JC{Fq) and lC{J-2) 
with window size equal 50 seconds are shown. A confu- 
sion matrix contains information about actual and predicted 
classifications done by a classification algorithm. Only the 
classification outcome for the top 20 nodes is reported. Notice 
that sample sizes per node were not identical. Therefore, the 
detection and FP rates reported in Table I] and HI] are slightly 
different from the detection and FP rates that can be derived 
from Table [III] It can be seen that the wormhole misbehavior 
was often misclassified as normal behavior. The reason for that 
is, according to our wormhole model, a wormhole is used only 
if it lies on the shortest path to a destination. This means, in 
some cases only a small fraction of the data traffic received at 
Si+i was taking "advantage" of the wormhole. Such a traffic 



pattern was harder to distinguish from the normal traffic. Note 
also that the sample size for the wormhole misbehavior is 
smaller than by the other classes. This is due to the fact 
that the number of wormholes in a network must not exceed 
a certain limit, otherwise the induced topological changes 
become extreme. 

An interesting question is which features contributed the 
most to the overall performance. Feature weights are reported 
in Table HV] (onlv features with a weight greater or equal 0.25 
are shown). Features that were computed locally (at node Si) 
are appended with "L"; features that were received from a 
remote node (5^+2) are appended with "R" (see Fig. |2]i. 

1C{J-q) was dominated by the watchdog features with M3_L 
and M4_L having the highest weights. /C(7^2) with 50s time 
window was dominated by R9_L, T1_L, T3_L, T1_R and 
T3_R. Packet dropping and delaying classification was based 
on the features T1_L, T3_L, T1_R and T3_R (we inspected the 
actual rules computed by the decision tree classifier). In this 
case, the classification was based on learning the differences 
in traffic at nodes Sj and Si+2- 

In the absence of the watchdog features, precision of 
wormhole classification was benefiting from the two topology 
features R5_L and R9_L; the reason is that wormholes in- 
crease the number of nodes that lie in a node's neighborhood. 
This precludes the necessity to rely on the features received 
from Si+2 since under wormhole misbehavior Si+2 might be 
identical with a wormhole's end point. We would like to point 
out that even though with R5_L can be easily manipulated 
by the wormhole, R9_L is much more resilient. This feature 
reflects the increased routing utility of nodes that lie in the 
neighborhood of a wormhole. It should also be noted that 
when using J^)^ wormhole detection is partly based on the 
M3_L watchdog feature, since wormholes by sending over 
a private link appear not to be forwarding packets. For the 
same reason, in more dynamic scenarios, the existence of a 
wormhole might also get exposed, if medium congestion at 
its start point drops significantly. 

Another result to be seen in Table |IV] is the insignificance 
of the MAC features Ml and M2. Both Ml and M2 indirecdy 
measure the medium congestion around a node. This means, 
should a node be dropping packets and thus decreasing the 
need for medium access, this should get detected with the 
help of Ml. On the other hand, any packet dropping should 
also get directly detected by the watchdog feature M3. There 
was a consideration that as Ml increases in value, M3 would 
accordingly decrease in value, i.e. Ml and M3 would form 
a kind of dynamic equilibrium. This however could not be 
demonstrated in our setup. The reason for that seems to be 
the relatively low data packet injection rate (0.5 packet/s) 
connected with additional data packets dropping (further de- 
creasing the number of data packets being forwarded). We 
believe, studying such equilibria could improve performance 
of misbehavior detection, at least in more dynamical scenarios 
than the one presented herein. 
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1) Induction algorithm: Decision tree with information gain measure (impurity measure). 

2) Feature selection algorithm: Forward feature selection. 

3) Validation approach: n-fold cross-validation with n = 20. 

4) Misbehavior types: Packet dropping, packet delaying, wormholes. 

5) Performance measures: Classification error, detection rate and false positives rate, their arithmetic average and 95% confidence 
intervals. 

6) Network topology: Snapshot of movement modeled by random waypoint mobility model i.e. it is a static network. There were 
1,718 nodes. The area was a square of 2,900m x 2,950m. The transmission range of transceivers was 100 meters. 

7) Number of connections: 50 CBR (constant bit rate) connections. MAC protocol: IEEE 802.11b DCF. Routing protocol: AODV. 
Other parameters: (i) Propagation path-loss model: two ray (ii) Channel frequency: 2.4 GHz (iii) Topography: Line-of-sight (iv) 
Radio type: Accnoise (v) Network protocol: IPv4 (vi) Transport protocol: UDP 

8) Injection rate: 0.5 packet/second. Data packet size was 68 bytes. 

9) The number of independent simulation runs for each combination of input parameters was 20. The simulation time was 4 hours. 
10) Simulator used: JiST/SWANS; hardware used: 30x Linux (SuSE 10.0) PC with 2GB RAM and Pentium 4 3GHz microprocessor 



Fig. 3. Parameters used in the experiment. 
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Detection rate. 





Normal 


Dropping 


Delaying 


Wormhole 


Any misbehavior 


Window size[s] 


FP rate 


C-f95% 


FP rate 




FP rate 




FP rate 




FP rate 








50 


1.04 


0.33 


0.39 


0.13 


0.78 


0.17 


4.35 


1.67 


0.94 


0.23 


100 


1.30 


0.41 


0.11 


0.08 


0.61 


0.26 


5.04 


1.81 


0.88 


0.28 


250 


1.48 


0.39 


0.34 


0.30 


0.49 


0.33 


8.32 


3.04 


1.44 


0.62 


500 


1.72 


0.57 


0.74 


0.77 


0.62 


0.44 


9.81 


3.75 


1.77 


0.66 






50 


3.73 


0.57 


3.46 


0.73 


5.47 


0.77 


8.12 


2.61 


3.63 


0.54 


100 


5.59 


0.84 


4.52 


0.98 


8.60 


0.86 


9.55 


3.35 


5.10 


0.53 


250 


9.02 


0.97 


6.46 


1.99 


15.61 


2.18 


16.86 


5.91 


9.41 


1.19 


500 


13.74 


1.63 


10.57 


3.36 


25.96 


3.05 


19.85 


6.95 


15.29 


1.76 






50 


3.71 


0.57 


3.44 


0.71 


5.28 


0.76 


8.23 


2.60 


3.55 


0.52 


100 


5.68 


0.86 


4.40 


0.99 


8.38 


0.91 


6.69 


2.12 


4.65 


0.44 


250 


9.11 


0.94 


6.22 


1.85 


15.38 


2.06 


16.05 


5.91 


9.28 


1.25 


500 


13.54 


1.81 


11.08 


3.73 


25.24 


3.56 


19.81 


7.52 


15.09 


2.27 



TABLE II 
False positives rate. 



B. Implications for Design of Autonomous Detection Systems 

The results in Tables U and |IV] demonstrate the relative 
strength of the used watchdog features. It is clear that they can 
be used almost alone and their classification ability is better 
than the combined classification ability of the features T1_L- 



T5_L and T1_R-T5_R. Notice that according to Table HV] only 
the (local) /o component in J^q is effectively used. From the 
results it can be concluded that as the size of the time window 
decreases, the classification ability of /C(/o) and /C(J^2) will 
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(a) K{Tq). Avg. sample size per node = 4,096. 
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(b) /C(,7-2). Avg. sample size per node = 3,789. 
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TABLE III 

Confusion MATRICES for (a) K.(To) and (b) K.(T2). Window size = 

50 SECONDS. l=NORMAL, 2=DR0PPING, 3=DELAYING. 4=W0RMH0LE. 
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TABLE IV 

Feature weights. Window size={500s, 50s}. A blank field means 

THAT EITHER THE FEATURE DOES NOT BELONG TO THE FEATURE SET OR 
ITS VALUE WAS < 0.25. 



equalize. That is: 

lim class, error^ {'IC{J-2)) ~ class, error^ {IC{fa)) 

win. size^Q 

(15) 

Eq. [15] characterizes the relationship between watchdog and 
J^2 based misbehavior detection. It points out that instead of 
observing each data packet's delivery in promiscuous mode 
by the node Si, it can be equally well done in a cooperative 
way by Si and 5^+2, if win. size ^> 0. 

In other words, if the time window is small enough that it 
always includes only a single event, the relationship between 
the events at Si and Si+2 becomes explicit, i.e. a data packet 
received at Si+2 can be unambiguously matched with a data 
packet forwarded by one of its two-hop neighbors. Decreasing 



the time window size is however connected with a high 
communication cost (each packet arrival at 5^+2 is explicitely 
reported to Si). For /o based misbehavior detection, this 
relationship is straightforward since Si only evaluates data 
packets that it just sent. 

In terms of learning complexity, the fundamental differences 
in these two approaches are: 

• If using /o, the send-overhear relationship is always 
explicit since Si can directly observe the forwarding of 
the data packet that it just sent to s^+i. Therefore, the 
classification task is straightforward. 

• If using J^2 with win. size ^ 0, learning based on 
data traffic at both Si and 5^+2 must be done. Feature 
averaging over a time window increases the classification 
task complexity. 

The above two observations and Eq. [15] offer a rough 
characterization of the trade-offs between detection approaches 
executed by a single node and by several nodes in cooperation. 

C. Co-stimulation Based Misbehavior Detection 

The rule formulated in Eq. [15] motivates the following 
strategy: approximate /C(/o) with IC{J-2). If the K.{J-2) based 
classification hints a possibility of misbehavior then employ 
IC{fo) to get a more reliable prediction. There are two basic 
reasons for applying this strategy: (i) computing T2 is more 
energy efficient and (ii) it allows for an energy efficient 
implementation of the co-stimulation approach presented in 
Alg.m 

The results achieved by co-stimulation are reported in 
Tables [viand IVII The classification performance evaluation of 
/C(J-2), /C(/o) and /C(7^2) ^ ^(/o) is based on three separate 
cross-validation experiments. It can be seen that for the four 
considered time windows sizes, K.{J-2) and IC{fo) fulfill the 
condition formulated in Eq. [6] The reported performance, 
taking into consideration the corresponding C/95%, is also in- 
line with Eqs. [TT] and [13] Most notably, it can be seen that 
except for wormholes, the following holds: 

FP rate%[lC{F2) ^ K.{h)) - FP rate^X{fo) (16) 

The results show that for win. size = 50s, 18.2% and 
13.3% wormholes were misclassified as belonging to the "nor- 
mal" class, when /C(J-2) and /C(/o) were applied, respectively. 
These two factors contributed to this result: (i) a wormhole 
start node can appear to "lose" a variable number of data 
packets depending on the wormhole utilization in a given time 
window , and (ii) the used features were not sufficient for this 
type of misbehavior. 

D. Analysis of Co-stimulation Energy Efficiency 

The rationale for applying co-stimulation rests on its ability 
to stimulate energy efficiency. Let us now therefore formulate 
an energy model for co-stimulation. 

Unlike when /C(/o) gets exclusively applied, in our ap- 
proach /C(/o) will get used only if (i) a true positive was 
detected by /C(J-2) or (ii) a false positive was mistakenly 
detected by JC{F2). This means, for a misbehavior free ad hoc 
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Dropping 


Delaying 


Wormhole 


Any misbehavior 


Win. size[s] 


Det. rate 


•^-^95% 


Det. rate 


^-^95% 


Det. rate 




Det. rate 


^-^95% 






50 


94.28 


1.10 


92.89 


0.89 


82.21 


5.91 


93.58 


0.99 


100 


91.71 


1.36 


88.53 


1.36 


78.21 


6.85 


90.30 


1.28 


250 


88.37 


2.55 


78.65 


1.87 


73.91 


8.48 


84.19 


1.49 


500 


84.13 


4.15 


64.64 


3.41 


71.16 


9.01 


76.40 


2.53 




ICifo) 


50 


99.37 


0.17 


99.39 


0.16 


87.59 


5.84 


98.15 


0.62 


100 


99.64 


0.22 


99.51 


0.22 


84.15 


6.68 


97.75 


0.71 


250 


99.27 


0.40 


99.47 


0.26 


84.49 


4.38 


97.50 


0.66 


500 


98.81 


0.67 


99.01 


0.44 


83.63 


5.03 


97.19 


0.85 




Co-stimulation: / 


C(J^2) ^ IC 


(/o) 


50 


95.02 


0.92 


93.62 


0.87 


81.00 


6.24 


93.42 


0.72 


100 


93.53 


1.17 


89.39 


1.51 


75.06 


7.35 


90.00 


0.84 


250 


89.68 


2.06 


79.47 


1.82 


71.09 


8.22 


84.70 


1.06 


500 


85.82 


3.81 


68.99 


3.47 


69.38 


9.50 


78.89 


1.71 



TABLE V 

Co-stimulation performance: detection rate. 





Dropping 


Delaying 


Wormhole 


Any misbehavior 


Win. size[s] 


FP rate 




FP rate 




FP rate 




FP rate 






/C(J-2) 


50 


3.44 


0.71 


5.28 


0.76 


8.23 


2.60 


3.55 


0.52 


100 


4.40 


0.99 


8.38 


0.91 


6.69 


2.12 


4.65 


0.44 


250 


6.22 


1.85 


15.38 


2.06 


16.05 


5.91 


9.28 


1.25 


500 


11.08 


3.73 


25.24 


3.56 


19.81 


7.52 


15.09 


2.27 






50 


0.39 


0.13 


0.78 


0.17 


4.35 


1.67 


0.94 


0.23 


100 


0.11 


0.08 


0.61 


0.26 


5.04 


1.81 


0.88 


0.28 


250 


0.34 


0.30 


0.49 


0.33 


8.32 


3.04 


1.44 


0.62 


500 


0.74 


0.77 


0.62 


0.44 


9.81 


3.75 


1.77 


0.66 




Co-stimulation: / 


C{T2) ^ ICifo) 


50 


0.48 


0.14 


0.45 


0.17 


3.25 


1.51 


0.98 


1.36 


100 


0.38 


0.19 


0.31 


0.16 


5.14 


2.41 


1.26 


1.75 


250 


0.97 


0.54 


0.40 


0.21 


3.54 


1.21 


1.28 


1.73 


500 


1.19 


0.70 


0.44 


0.28 


4.64 


2.13 


1.67 


2.59 



TABLE VI 

Co-stimulation performance: FP rate. 



network, the energy saving over an exclusive /C(/o) approach 
is related to FP rate^.{JC{F2))- We focus on the energy 
efficiency analysis in a misbehavior free ad hoc network, since 
it is reasonable to assume that an ad hoc network will work 
reliably, most of the time. 

With respect to the above said, we assume the following 
energy model for co-stimulation in a misbehavior free ad hoc 
network: 

an) = + FP rate%{K.{T2)) x (,}„{n) (17) 

where n is the number of data packets that need to be 
overheard by in promiscuous mode for the purpose of /q 
based classification. ^ (n) is the total energy consumption after 
co-stimulation in a time window of the length win. size, 
^jF2 is the energy consumption related to the computation 
of F2 and C/o(n) is the energy consumption related to the 
computation of /o. In the following, we also assume that Cj = 
"any misbehavior". Additionally, we assume that decision tree 
query costs are negligible, if compared to feature computation 
costs that are dominated by the communication costs. 



The choice of n depends on the expected data traffic pattern 
as well as the type of misbehavior. With respect to our 
experimental setup, we assume that /C(/o) is based on a 50- 
second time window size. Note that if s^+i forwards data 
packets for a single connection then n — 0.5 x win. size 
under our data traffic model, i.e. we consider n = 25. 

Energy Consumption Analysis: Feeney and Nilsson inves- 
tigated in ll26l the energy consumption of a Lucent 2Mbps 
IEEE802.il wireless card. They concluded that energy con- 
sumption can be modeled as a x size + h, where size is the 
data packet size in bytes. The constants a and b reflect the 
consumption in fiJ when sending (o = 1.9,6 = 454), receiv- 
ing (a = 0.5, b = 356) or overhearing (a = 0.39, b = 140) a 
data packet. This translates into the following energy models 
for /C(/o) and /C(J^2), respectively: 

^y^(n) = n X (0.39 x size{data) + 140) (18) 

= 2 X ((1.9 X size(f2) + 454) + (0.5 x size{f2) + 356)) 

(19) 

where size{data) is the data packet size in bytes and size{f2) 
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win. size 


FP rate [%] 


5(25) 


€'(25) 


7(25) 


Det.rate [%] 


FP rate [%] 


[s] 




[mJ] 


[mJ] 


[%] 


ICiTi) ~> IC(fo) 




50 


3.55 


2.33 


23.29 


82.73 


93.42 


0.98 


100 


4.65 


2.48 


12.39 


90.81 


90.00 


1.26 


250 


9.28 


3.10 


6.20 


95.40 


84.70 


1.28 


500 


15.09 


3.89 


3.89 


97.12 


78.89 


1.67 



TABLE VII 

Co-stimulation : energy consumption and misbehavior detection performance. 



is the size of the data packet carrying /2'^^ in bytes. Eq. [18] 
reflects the fact that in promiscuous mode the energy consump- 
tion grows Unearly with the number of data packets overheard. 
Eq. [19] reflects the cost of sending /|'^^ over two hops to s;. 

The results for energy efficiency using co-stimulation are 
shown in Table IVllI We applied the following parameters: 
size{data) = IkB and size{f2) — 485 + size{header) 
(corresponding to 24 features, 2 bytes reserved for each 
feature; size{header) = 0, if f2^*^ is transported using 
piggybacking). C'(n) and j{n) are defined as follows; 



500.0 



wm. size 



X ^(n) 7(n) = 1.0 - 



(20) 



e/„(250) 

£,'{n) is the adjusted energy consumption for a 500-second 
time period. 7(n) is the energy saving relative to (250), i.e. 
relative to the energy consumption of /o based classification 
in a 500-second time window: 

Inspecting Table IVIII it can be seen that under our exper- 
imental setup co-stimulation can save up to 97.12% energy. 
It can also be seen that co-stimulation offers a possibility to 
choose a trade-off between energy consumption and detection 
performance. This can be seen by comparing 7(25) and the 
right-hand side of Table IVIII where the classification results 
for the "any misbehavior" class are shown. 

In FigUa), the accumulated energy consumption using 500- 
second time windows is shown. It can be seen that after 60 
minutes of operation, the energy saving in a misbehavior free 
ad hoc wireless network for the data packet size of IkB is 
nearly two orders of magnitude. For a smaller data packet 
size of 64 bytes, more typical for wireless sensor networks, the 
energy saving is more than one order of magnitude. In FigHJb) 
the dependence of energy consumption on FP rate {J- 2) is 
shown. 

By solving £,fg{n) = ^jr^, we can show that < £,fo{n), 
if n > 3.43, i.e. the condition formulated in Eq. [5] is satisfied 
for a reasonable time window size. 

E. Further Discussion of Results 

We restricted ourselves to three basic types of misbehav- 
ior Our goal was to investigate one representative from the 
qualitative group of misbehavior types (packet dropping), one 
from the quantitate group (packet delaying) and one from the 
topology group (wormholes). Many types of misbehavior can 
be classified within these three groups. For example, a data 
packet manipulation attempt falls within the qualitative group 
since it can be detected by monitoring the data packet size 



and/or correction code, learning the usual manipulation rate 
(due to e.g. a routing protocol) and subsequent classification. 

Our approach to misbehavior detection can be considered 
very general. It is well known that some types of misbehavior 
can be efficiently detected, if for example time synchronization 
or GPS (Global Positioning System) information is available; 
see e.g. results on packet leashes for wormhole detection 
due to Hu et al. |32|. Such specialized capabilities would 
however interfere with our intention to contrast implications 
of a non-cooperative versus cooperative detection (/C(/o) vs 
co-stimulation based misbehavior detection). 

The results presented in Tables H] [11] and |V] offer a good 
performance guidance for detection systems that aim at prof- 
iting from traffic measurements in the neighborhood of a 
misbehaving node. Even though misbehavior detection can 
be also done by an independent third-party node that is not 
lying on the given connection but that can easily overhear the 
transmission, its observations with respect to the misbehaving 
node are not going to be more precise than the combined 
information from nodes Si and Si+2- Such a scenario also 
implicates a continuous operation in promiscuous mode. In 
addition to energy inefficiency, operation in promiscuous mode 
increases vulnerability to intrusions since once a data packet 
gets intercepted and (temporarily) stored in the memory, it 
allows for an application of the techniques described in 14|. 
This can lead to execution of the code carried in the packet's 
data portion. Unlike in usual packet forwarding, (i) such a 
scenario would simplify choosing a specific victim and (ii) 
it would be harder to detect because other nodes might be 
unaware of the fact that this node was able to overhear the 
transmission. 

We concentrated on features that can be computed without 
much computational overhead. Our results could have been 
somewhat different if a more complex Fourier or wavelets 
analysis of the packet stream had been done. As the results by 
Barford et al. point out lf34l . this could lead to good anomaly 
detection rates. 

The classification in our setup is done on a single time 
window basis. This has a certain energy efficiency effect as 
a post-processing phase (statistical analysis, clustering) can 
be avoided. This positively impacts the time to detect a 
misbehavior 

The BIS is an inherently distributed system with function- 
ality that can be very hard to mimic. For example, the success 
of the negative selection, a learning mechanism applied in 
training and priming of T-cells in the thymus, rests on the 
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efficiency of the blood-thymic barrier that guarantees that 
the thymus stays pathogen free at all times. This implies 
that T-cells being trained in the thymus never encounter any 
type of pathogen before released into the body. This helps 
tremendously in detecting foreign cells. Mapping functionality 
of the BIS to a computational paradigm is a hot topic within 
the AIS community. Our goal was to concentrate on the 
simplest mechanisms. Most notably, we were motivated by 
the interplay between the innate and adaptive immune system. 

The mechanisms of the innate immune system bear a certain 
resemblance to the J"o based classification approach. The 
innate system is for example very efficient in signaling tissue 
injury or damage to the adaptive immune system. This ability, 
as pointed out before, has been induced over the evolutionary 
time but is based on some very rudimentary methods such as 
recognizing an unusually high level of dead or damaged self 
cells (e.g. blood cells). This can be directly compared with the 
very straightforward functionality of watchdogs. Similarly, the 
more machine learning extensive classification approach based 
on F2 can be, in our opinion, compared with the adaptive 
immune system. 

X. Conclusions 

We presented and experimentally evaluated a novel 
immuno-inspired energy efficient approach to misbehavior 
detection in ad hoc wireless networks. We demonstrated that 
under our experimental setup, the (communication related) 
energy saving for IkB data packets, if compared to watchdog 
monitoring, is nearly two orders of magnitude. For smaller 
data packets of 64 bytes, more typical for wireless sensor 
networks, the energy saving is above one order of magnitude. 

We achieved a good control of the false positives rate. This 
is important in order to reduce the maintenance costs of ad hoc 
networks. We also showed that our co-stimulation approach 
becomes cheaper than its watchdog based counterpart at a time 
window size of 6.86 second. 



We demonstrated a relationship between the energy ef- 
ficiency and detection performance. This approach can be 
thus used to find a trade-off between misbehavior detection 
performance and energy efficiency. 

We applied three different node misbehavior models: data 
packet dropping, data packet delaying and wormholes. A 
wormhole can only be formed if two nodes closely cooperate, 
i.e. it is an instance of misbehavior done in collusion. 

We would like to note that our results are independent 
from the global network topology, since in our approach we 
only looked at a two-hop segment of a connection. Since we 
evaluated the performance at 20 distinct nodes with different 
node degrees, our results provide a robust estimate of the 
expected performance. 

Motivated by the results presented herein, Drozda et al. 
investigated an error propagation algorithm |35| that takes 
advantage of Eq. [15] in order to induce systemic resistance 
against misbehavior It also removes the reliance on a labeled 
dataset for learning the normal behavior and misbehavior 

The datasets used in our experiments are available for 
download L36J . 
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